跳到主要内容

Speed up hive queries with databend

· 阅读需 2 分钟

Backgroup

now databend support hive catalog to run hive queries, this docs shows how to set up databend-hive environment and run hive sqls.

How to set up databend-hive cluster

hiveserver&metastore&hdfs is supposed to be pre-installed.

  1. download a databend-release with hive support, or build from source
## make sure JAVA_HOME is set
export JAVA_HOME=/path/to/java
export LD_LIBRARY_PATH=${JAVA_HOME}/lib/server:${LD_LIBRARY_PATH}
cargo build --features hive,storage-hdfs
  1. setup a databend cluster, refer to deploying-databend
  2. add hive catalog and hdfs storage to databend-query.toml
[storage]
type = "hdfs"

[storage.hdfs]
# hdfs namenode address,such as 127.0.0.1:8020
name_node = "xx"
root = ""

[catalogs.hive]
type = "hive"
# hive metastore address, such as 127.0.0.1:9083
address = "xx"

  1. run databend-query with java&hadoop environment
export HADOOP_HOME=xxx
export JAVA_HOME=xxx, such as /Library/Java/JavaVirtualMachines/openjdk-11.jdk/Contents/Home
export LD_LIBRARY_PATH=$JAVA_HOME/lib/server:$LD_LIBRARY_PATH

./bin/databend-query -c ./databend-query.toml > query.log 2>&1 &
  1. setup hive related settings with mysql client
set global sql_dialect = 'hive';

suggest settings:

-- for chinese users
set global timezone = 'Asia/Shanghai';
set global max_execute_time = 180000;

-- support hive nvl function
create FUNCTION nvl as (a,b) -> ifnull(a,b);
  1. query hive data using mysql client or mysql jdbc client. Note: hive tables must be referred as hive.db.table
select * from hive.$db.$table limit 10;

Limititions

  1. only support parquet table, not support orc,txt
  2. not support struct&map&decimal hive data types
  3. only support hive select queries, not support DDL, insert, DML sqls
  4. not support hive udfs, hive functions are limited supported

hive features is now in beta stage, please feel free to report bugs&suggestions in databend issues.